Mining tree-query associations in graphs
نویسندگان
چکیده
New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasets structured as graphs. We introduce a novel class of tree-shaped patterns called tree queries, and present algorithms for mining tree queries and tree-query associations in a large data graph. Novel about our class of patterns is that they can contain constants, and can contain existential nodes which are not counted when determining the number of occurrences of the pattern in the data graph. Our algorithms have a number of provable optimality properties, which are based on the theory of conjunctive database queries. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis.
منابع مشابه
Graph Indexing: Tree + Delta >= Graph
Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query graph q, the graph containment query is to retrieve all graphs in G which contain q as subgraph(s). Due to the vast number of grap...
متن کاملScalable Evaluation of k-NN Queries on Large Uncertain Graphs
Large graphs are prevalent in social networks, traffic networks, and biology. These graphs are often inexact. For example, in a friendship network, an edge between two nodesu andv indicates that users u and v have a close relationship. This edge may only exist with a probability. To model such information, the uncertain graph model has been proposed, in which each edge e is augmented with a pro...
متن کاملKernel-based Similarity Search in Massive Graph Databases with Wavelet Trees
Similarity search in databases of labeled graphs is a fundamental task in managing graph data such as XML, chemical compounds and social networks. Typically, a graph is decomposed to a set of substructures (e.g., paths, trees and subgraphs) and a similarity measure is defined via the number of common substructures. Using the representation, graphs can be stored in a document database by regardi...
متن کاملTowards Memory-Efficient Answering of Tree-Shaped SPARQL Queries using GPUs
We present an idea of efficient query answering over an RDF dataset employing a consumer-grade graphic card for an efficient computation. We consider tree-shaped SPARQL queries and static datasets, to facilitate data mining over RDF graphs in warehouse-like setups. Reasons to see the poster: a) presentation of the approach with examples; b) possibility of discussion about the implementation det...
متن کاملAnalyzing SQL Query Logs using Multi-Relational Graphs
Computer Science 6 (Data Management), FAU Erlangen-Nürnberg {andreas.wahl|richard.lenz}@fau.de Analytical SQL queries are a valuable source of information. They contain expert knowledge that cannot be inferred from schemas or content alone. Consider, for example, data lake scenarios, where relational and semi-structured data sources are combined in a single storage and processing environment. D...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1008.2626 شماره
صفحات -
تاریخ انتشار 2010